177 research outputs found

    Spectral Estimation of Conditional Random Graph Models for Large-Scale Network Data

    Get PDF
    Generative models for graphs have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are either only suitable for characterizing some particular network properties (such as degree distribution or clustering coefficient), or they are aimed at estimating joint probability distributions, which is often intractable in large-scale networks. In this paper, we first propose a novel network statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random graph model, switching the focus from the estimation of joint probability distributions to a more tractable conditional estimation setting. After analyzing the dependence structure characterizing Fiedler random graphs, we evaluate them experimentally in edge prediction over several real-world networks, showing that they allow to reach a much higher prediction accuracy than various alternative statistical models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

    On Probability Distributions for Trees: Representations, Inference and Learning

    Get PDF
    We study probability distributions over free algebras of trees. Probability distributions can be seen as particular (formal power) tree series [Berstel et al 82, Esik et al 03], i.e. mappings from trees to a semiring K . A widely studied class of tree series is the class of rational (or recognizable) tree series which can be defined either in an algebraic way or by means of multiplicity tree automata. We argue that the algebraic representation is very convenient to model probability distributions over a free algebra of trees. First, as in the string case, the algebraic representation allows to design learning algorithms for the whole class of probability distributions defined by rational tree series. Note that learning algorithms for rational tree series correspond to learning algorithms for weighted tree automata where both the structure and the weights are learned. Second, the algebraic representation can be easily extended to deal with unranked trees (like XML trees where a symbol may have an unbounded number of children). Both properties are particularly relevant for applications: nondeterministic automata are required for the inference problem to be relevant (recall that Hidden Markov Models are equivalent to nondeterministic string automata); nowadays applications for Web Information Extraction, Web Services and document processing consider unranked trees

    Series, Weighted Automata, Probabilistic Automata and Probability Distributions for Unranked Trees.

    Get PDF
    We study tree series and weighted tree automata over unranked trees. The message is that recognizable tree series for unranked trees can be defined and studied from recognizable tree series for binary representations of unranked trees. For this we prove results of Denis et al (2007) as follows. We extend hedge automata -- a class of tree automata for unranked trees -- to weighted hedge automata. We define weighted stepwise automata as weighted tree automata for binary representations of unranked trees. We show that recognizable tree series can be equivalently defined by weighted hedge automata or weighted stepwise automata. Then we consider real-valued tree series and weighted tree automata over the field of real numbers. We show that the result also holds for probabilistic automata -- weighted automata with normalisation conditions for rules. We also define convergent tree series and show that convergence properties for recognizable tree series are preserved via binary encoding. From Etessami and Yannakakis (2009), we present decidability results on probabilistic tree automata and algorithms for computing sums of convergent series. Last we show that streaming algorithms for unranked trees can be seen as slight transformations of algorithms on the binary representations

    Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

    Get PDF
    International audienceStatistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches

    Decentralized Collaborative Learning of Personalized Models over Networks

    Get PDF
    We consider a set of learning agents in a col-laborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach , inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. Our algorithm to optimize this challenging objective in a decentralized way is based on ADMM

    Hypernode Graphs for Learning from Binary Relations between Groups in Networks

    Get PDF
    International audienceThe aim of this paper is to propose methods for learning from interactions between groups in networks. We introduced hypernode graphs in Ricatte et al (2014) a formal model able to represent group interactions and able to infer individual properties as well. Spectral graph learning algorithms were extended to the case of hypern-ode graphs. As a proof-of-concept, we have shown how to model multiple players games with hypernode graphs and that spectral learning algorithms over hyper-node graphs obtain competitive results with skill ratings specialized algorithms. In this paper, we explore theoretical issues for hypernode graphs. We show that hypernode graph kernels strictly generalize over graph kernels and hypergraph kernels. We show that hypernode graphs correspond to signed graphs such that the matrix D − W is positive semidefinite. It should be noted that homophilic relations between groups may lead to non homophilic relations between individ-uals. Moreover, we also present some issues concerning random walks and the resistance distance for hypernode graphs

    Clustering Spectral avec Contraintes de Paires réglées par Noyaux Gaussiens

    Get PDF
    International audienceRésumé Nous considérons le problème du clustering spectral partielle-ment supervisé par des contraintes de la forme « must-link » et « cannot-link ». De telles contraintes apparaissent fréquemment dans divers pro-blèmes, comme la résolution de la coréférence en traitement automatique du langage naturel. L'approche développée dans ce papier consiste à ap-prendre une nouvelle représentation de l'espace pour les données, ainsi qu'une nouvelle distance dans cet espace. Cette représentation est ob-tenue via une transformation linéaire de l'enveloppe spectrale des don-nées. Les contraintes sont exprimées avec des fonctions Gaussiennes qui réajustent localement les similarités entre les objets. Un problème d'op-timisation global et non convexe est alors obtenu et l'apprentissage du modèle se fait grâce à des techniques de descentes de gradient. Nous évaluons notre algorithme sur des jeux de données standards et le com-parons à divers algorithmes de l'état de l'art, comme [14,18,32]. Les ré-sultats sur ces jeux de données, ainsi que sur le jeu de données de la tâche de coréférence CoNLL-2012, montrent que notre algorithme amé-liore significativement la qualité des clusters obtenus par les précédentes approches, et est plus robuste en montée en charge

    Hypernode Graphs for Spectral Learning on Binary Relations over Sets

    No full text
    Paper accepted for publication at ECML/PKDD 2014International audienceWe introduce hypernode graphs as weighted binary relations between sets of nodes: a hypernode is a set of nodes, a hyperedge is a pair of hypernodes, and each node in a hypernode of a hyperedge is given a non negative weight that represents the node contribution to the relation. Hypernode graphs model binary relations between sets of individuals while allowing to reason at the level of individuals. We present a spectral theory for hypernode graphs that allows us to introduce an unnormalized Laplacian and a smoothness semi-norm. In this framework, we are able to extend spectral graph learning algorithms to the case of hypernode graphs. We show that hypernode graphs are a proper extension of graphs from the expressive power point of view and from the spectral analysis point of view. Therefore hypernode graphs allow to model higher order relations whereas it is not true for hypergraphs as shown in~\cite{Agarwal2006}. In order to prove the potential of the model, we represent multiple players games with hypernode graphs and introduce a novel method to infer skill ratings from game outcomes. We show that spectral learning algorithms over hypernode graphs obtain competitive results with skill ratings specialized algorithms such as Elo duelling and TrueSkill

    Interactive Tuples Extraction from Semi-Structured Data

    Get PDF
    International audienceThis paper studies from a machine learning viewpoint the problem of extracting tuples of a target n-ary relation from tree structured data like XML or XHTML documents. Our system can extract, without any post-processing, tuples for all data structures including nested, rotated and cross tables. The wrapper induction algorithm we propose is based on two main ideas. It is incremental: partial tuples are extracted by increasing length. It is based on a representation-enrichment procedure: partial tuples of length i are encoded with the knowledge of extracted tu- ples of length i − 1. The algorithm is then set in a friendly interactive wrapper induction system for Web documents. We evaluate our system on several information extraction tasks over corporate Web sites. It achieves state-of-the-art results on simple data structures and succeeds on complex data structures where previous approaches fail. Experiments also show that our interactive framework significantly reduces the number of user interactions needed to build a wrapper

    A Spectral Framework for a Class of Undirected Hypergraphs

    Get PDF
    We extend the graph spectral framework to a new class of undirected hypergraphs with bipartite hyperedges. A bipartite hyperedge is a pair of disjoint sets of nodes in which every node is associated with a weight. A bipartite hyperedge can be viewed as a relation between two teams of nodes in which every node has a weighted contribution to its team. Undirected hypergraphs generalize over undirected graphs. Consistently with the case of graphs, we define the notions of hypergraph gradient, hypergraph Laplacian, and hypergraph kernel as the Moore-Penrose pseudoinverse of a hypergraph Laplacian. Therefore, smooth labeling of (teams of) nodes and hypergraph regularization methods can be performed. Contrary to the graph case, we show that the class of hypergraph Laplacians is closed by the pseudoinverse operation (thus it is also the class of hypergraphs kernels), and is closed by convex linear combination. Closure properties allow us to define (hyper)graph combinations and operations while keeping a hypergraph interpretation of the result. We exhibit a subclass of signed graphs that can be associated with hypergraphs in a constructive way. A hypergraph and its associated signed graph have the same Laplacian. This property allows us to define a distance between nodes in undirected hypergraphs as well as in the subclass of signed graphs. The distance coincides with the usual definition of commute-time distance when the equivalent signed graph turns out to be a graph. We claim that undirected hypergraphs open the way to solve new learning tasks and model new problems based on set similarity or dominance. We are currently exploring applications for modeling games between teams and for graph summarization
    • …
    corecore